Estimated time needed: 1 hour
In this project, you'll build a question-answering (QA) tool capable of extracting and summarizing information from YouTube videos. Leveraging LangChain and a large language model (LLM), the tool will answer specific questions based on a video's transcript. You'll work with components like video transcript loaders, text processors, embedding models, vector databases, and retrievers, while using Streamlit for a user-friendly interface.
With the explosion of online video content, manually searching through lengthy footage is inefficient. This project automates that process, transforming dense transcripts into concise summaries and enabling precise video segment identification using Facebook AI Similarity Search (FAISS). By the end of the project, you'll have developed a powerful system that streamlines how we interact with multimedia data, making video content more accessible and insightful.
First, let's create a virtual environment. A virtual environment allows you to manage dependencies for different projects separately, avoiding conflicts between package versions.
To open a terminal, go to the top menu and click Terminal > New Terminal.
In the terminal of your Cloud IDE, ensure that you are in the path /home/project, then run the following commands to create a Python virtual environment.
- 1
- 2
- 3
pip install virtualenvvirtualenv my_env # create a virtual environment named my_envsource my_env/bin/activate # activate my_env
To ensure seamless execution of your scripts, and considering that certain functions within these scripts rely on external libraries, it's essential to install some prerequisite libraries before you begin. For this project, the key libraries you'll need are:
youtube-transcript-api for extracting transcripts from YouTube videos.faiss-cpu for efficient similarity search.langchain and langchain-community for text processing and language models.ibm-watsonx-ai and langchain-ibm for integrating IBM Watson services.streamlit for building the web application interface.Here's how to install these packages (from your terminal):
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
# installing necessary packages in my_envpip install youtube-transcript-api==1.2.1pip install faiss-cpu==1.8.0pip install langchain==0.2.6 | tail -n 1pip install langchain-community==0.2.6 | tail -n 1pip install ibm-watsonx-ai==1.0.10 | tail -n 1pip install langchain_ibm==0.1.8 | tail -n 1pip install gradio==4.44.1 | tail -n 1
The environment is now ready to create the application.
It's time to construct the YouTube bot!
Let's start off by creating a new Python file that will store your bot. Click on the button below to create a new Python file, and call it ytbot.py. If, for whatever reason, the button does not work, you can create the new file by clicking File > New Text File. Be sure to save the file as ytbot.py.
In the following sections, you will populate ytbot.py with your bot.
Inside ytbot.py, import the following libraries from streamlit, youtube_transcript_api, ibm_watsonx_ai, langchain_ibm, langchain, and langchain_community. The imported classes are required for initializing models with the correct credentials, splitting text, initializing a vector store, loading YouTube transcripts, generating a question-answer retriever, and using Gradio.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
# Import necessary libraries for the YouTube botimport gradio as grimport re #For extracting video idfrom youtube_transcript_api import YouTubeTranscriptApi # For extracting transcripts from YouTube videosfrom langchain.text_splitter import RecursiveCharacterTextSplitter # For splitting text into manageable segmentsfrom ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes # For specifying model typesfrom ibm_watsonx_ai import APIClient, Credentials # For API client and credentials managementfrom ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams # For managing model parametersfrom ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods # For defining decoding methodsfrom langchain_ibm import WatsonxLLM, WatsonxEmbeddings # For interacting with IBM's LLM and embeddingsfrom ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs # For retrieving model specificationsfrom ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes # For specifying types of embeddingsfrom langchain_community.vectorstores import FAISS # For efficient vector storage and similarity searchfrom langchain.chains import LLMChain # For creating chains of operations with LLMsfrom langchain.prompts import PromptTemplate # For defining prompt templates
YouTube video URLs typically follow this format: https://www.youtube.com/watch?v=VIDEO_ID
The VIDEO_ID is a unique 11-character string that identifies the video. To extract this ID, we'll use a regular expression that captures this 11-character string from the URL.
Define a function to extract the video ID from the provided YouTube URL.
- 1
- 2
- 3
- 4
- 5
def get_video_id(url):# Regex pattern to match YouTube video URLspattern = r'https:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]{11})'match = re.search(pattern, url)return match.group(1) if match else None
The get_video_id() function is designed to extract the unique VIDEO_ID from a YouTube URL.
- 1
- 2
- 3
url = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"video_id = get_video_id(url)print(video_id) # Output: dQw4w9WgXcQ
If the URL matches the YouTube format: The function returns the extracted 11-character VIDEO_ID (for example, dQw4w9WgXcQ).
If the URL does not match: The function returns None.
The YouTubeTranscriptApi allows us to retrieve transcripts (subtitles) for a given video. This function first extracts the video ID from the YouTube URL, then fetches the transcripts available for that video. Transcripts can be either automatically generated or manually provided by the video uploader.
Here's the function to fetch the transcript:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
def get_transcript(url):# Extracts the video ID from the URLvideo_id = get_video_id(url)# Create a YouTubeTranscriptApi() objectytt_api = YouTubeTranscriptApi()# Fetch the list of available transcripts for the given YouTube videotranscripts = ytt_api.list(video_id)transcript = ""for t in transcripts:# Check if the transcript's language is Englishif t.language_code == 'en':if t.is_generated:# If no transcript has been set yet, use the auto-generated oneif len(transcript) == 0:transcript = t.fetch()else:# If a manually created transcript is found, use it (overrides auto-generated)transcript = t.fetch()break # Prioritize the manually created transcript, exit the loopreturn transcript if transcript else None
Following is an example of how to call get_transcript() using a sample YouTube video URL:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
# Sample YouTube URLurl = "https://www.youtube.com/watch?v=dQw4w9WgXcQ"# Fetching the transcripttranscript = get_transcript(url)# Output the fetched transcriptprint(transcript)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
[{"text": "We're no strangers to love.","start": 0.0,"duration": 3.5},{"text": "You know the rules and so do I.","start": 3.5,"duration": 4.0},{"text": "A full commitment's what I'm thinking of.","start": 7.5,"duration": 4.0}]
If no transcript is available or if the video ID is invalid, the function will return an empty result or an appropriate error, depending on the availability of the transcript for the provided video ID.
When we fetch the transcript, it often comes in a structured format. Each entry in the transcript is typically represented as a dictionary with the following structure:
- 1
- 2
- 3
- 4
- 5
{"text": "Transcript text here","start": 0.0,"duration": 3.0}
text: The spoken text from the video.start: The time (in seconds) when the text starts in the video.duration: The duration (in seconds) for which the text is displayed.This function transforms the fetched transcript into a more readable format by extracting the text and its corresponding start time.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
def process(transcript):# Initialize an empty string to hold the formatted transcripttxt = ""# Loop through each entry in the transcriptfor i in transcript:try:# Append the text and its start time to the output stringtxt += f"Text: {i.text} Start: {i.start}\n"except KeyError:# If there is an issue accessing 'text' or 'start', skip this entrypass# Return the processed transcript as a single stringreturn txt
Following is an example of how to call process() using a fetched transcript:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
# Sample transcript listtranscript = [{"text": "We're no strangers to love.","start": 0.0,"duration": 3.5},{"text": "You know the rules and so do I.","start": 3.5,"duration": 4.0},{"text": "A full commitment's what I'm thinking of.","start": 7.5,"duration": 4.0}]# Processing the transcriptformatted_transcript = process(transcript)# Output the processed transcriptprint(formatted_transcript)
The process() function returns a formatted string that contains the text and start time for each entry in the transcript. The output format will look like this:
- 1
- 2
- 3
Text: We're no strangers to love. Start: 0.0Text: You know the rules and so do I. Start: 3.5Text: A full commitment's what I'm thinking of. Start: 7.5
The RecursiveCharacterTextSplitter from LangChain helps split long transcripts into smaller, more manageable chunks for easier processing. This function takes a processed transcript and breaks it down into specified chunk sizes, with some overlap between chunks to ensure context is preserved across segments. This is useful when handling large texts that need to be processed by models or other tools.
Here's the function to chunk the transcript:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
def chunk_transcript(processed_transcript, chunk_size=200, chunk_overlap=20):# Initialize the RecursiveCharacterTextSplitter with specified chunk size and overlaptext_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,chunk_overlap=chunk_overlap)# Split the transcript into chunkschunks = text_splitter.split_text(processed_transcript)return chunks
Chunking the transcript: This function splits a large transcript into smaller chunks using a specified chunk_size (default is 200 characters) and an optional chunk_overlap (default is 20 characters) to maintain context across the chunks.
RecursiveCharacterTextSplitter: Ensures the text is split intelligently, without breaking up sentences or paragraphs unnaturally, making it ideal for processing large documents or transcripts.
Below is an example of how to call chunk_transcript() using a processed transcript:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
# Sample processed transcript stringprocessed_transcript = """Text: We're no strangers to love. Start: 0.0Text: You know the rules and so do I. Start: 3.5Text: A full commitment's what I'm thinking of. Start: 7.5"""# Chunking the transcriptchunks = chunk_transcript(processed_transcript)# Output the chunksprint(chunks)
The chunk_transcript() function returns a list of strings, each representing a chunk of the original processed transcript. For example:
- 1
- 2
- 3
- 4
["Text: We're no strangers to love. Start: 0.0\nText: You know the rules and so do I. Start: 3.5","Text: You know the rules and so do I. Start: 3.5\nText: A full commitment's what I'm thinking of. Start: 7.5"]
In this output, each chunk contains overlapping segments of the transcript to maintain context, which is useful when processing each chunk with models or tools that require shorter inputs.
Set up the necessary credentials to access IBM Watson services. This function initializes the required credentials, client, and project details for interacting with the watsonx model.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
def setup_credentials():# Define the model ID for the WatsonX model being usedmodel_id = "meta-llama/llama-3-2-3b-instruct"# Set up the credentials by specifying the URL for IBM Watson servicescredentials = Credentials(url="https://us-south.ml.cloud.ibm.com")# Create an API client using the credentialsclient = APIClient(credentials)# Define the project ID associated with the WatsonX platformproject_id = "skills-network"# Return the model ID, credentials, client, and project ID for later usereturn model_id, credentials, client, project_id
Configure the parameters for the watsonx model. This function sets up various generation parameters, such as the decoding method and token limits, to customize the behavior of the model during text generation.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
def define_parameters():# Return a dictionary containing the parameters for the WatsonX modelreturn {# Set the decoding method to GREEDY for generating textGenParams.DECODING_METHOD: DecodingMethods.GREEDY,# Specify the maximum number of new tokens to generateGenParams.MAX_NEW_TOKENS: 900,}
Instantiate the watsonx LLM for summarization and Q&A tasks. This function initializes the watsonx language model by providing the necessary model ID, credentials, project ID, and parameters for its configuration.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
def initialize_watsonx_llm(model_id, credentials, project_id, parameters):# Create and return an instance of the WatsonxLLM with the specified configurationreturn WatsonxLLM(model_id=model_id, # Set the model ID for the LLMurl=credentials.get("url"), # Retrieve the service URL from credentialsproject_id=project_id, # Set the project ID for accessing resourcesparams=parameters # Pass the parameters for model behavior)
This section covers the process of embedding transcript chunks and implementing similarity search using FAISS.
We use the IBM SLATE-30M (ENG) model to generate embeddings for the transcript chunks. This function initializes the embedding model, which converts the textual data into numerical vectors that can be utilized for various natural language processing tasks, such as similarity calculations, clustering, and machine learning model training.
Embeddings are dense vector representations of text, where similar pieces of text are mapped to nearby points in the vector space. This allows models to understand the semantic relationships between words and phrases, making embeddings crucial for tasks like information retrieval, clustering, and classification.
The SLATE-30M model is specifically designed for English language embeddings, providing high-quality representations that capture the semantic meaning of the input text.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
def setup_embedding_model(credentials, project_id):# Create and return an instance of WatsonxEmbeddings with the specified configurationreturn WatsonxEmbeddings(model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value, # Set the model ID for the SLATE-30M embedding modelurl=credentials["url"], # Retrieve the service URL from the provided credentialsproject_id=project_id # Set the project ID for accessing resources in the Watson environment)
FAISS (Facebook AI Similarity Search) is a library designed for efficient similarity search and clustering of dense vectors, enabling rapid retrieval of nearest neighbors in high-dimensional spaces. This function creates a FAISS index from a list of text chunks using the specified embedding model. By converting text chunks into embeddings and indexing them with FAISS, we can quickly find the most similar chunks based on cosine similarity or other distance metrics. This is particularly useful in applications such as information retrieval, recommendation systems, and natural language understanding.
The function takes two parameters:
chunks: A list containing the text chunks that have been processed and are ready for indexing. These chunks typically represent segments of text that need to be compared or searched.embedding_model: The model used to generate embeddings for the text chunks. This model transforms the chunks into numerical vectors that represent their semantic meaning.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
def create_faiss_index(chunks, embedding_model):"""Create a FAISS index from text chunks using the specified embedding model.:param chunks: List of text chunks:param embedding_model: The embedding model to use:return: FAISS index"""# Use the FAISS library to create an index from the provided text chunksreturn FAISS.from_texts(chunks, embedding_model)
In this section, we will search for specific queries within the embedded transcript using the FAISS index. The function takes a query and finds the top k most similar text chunks from the indexed embeddings. This is useful for retrieving relevant information based on user queries or for identifying related content within a larger dataset.
The function takes the following parameters:
faiss_index: The FAISS index created from the embedded transcript chunks. This index allows for efficient similarity searches based on vector representations.
query: The text input for which we want to find similar chunks. This could be a user question or a topic of interest.
k: An optional parameter that specifies the number of similar results to return (default is 3).
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
def perform_similarity_search(faiss_index, query, k=3):"""Search for specific queries within the embedded transcript using the FAISS index.:param faiss_index: The FAISS index containing embedded text chunks:param query: The text input for the similarity search:param k: The number of similar results to return (default is 3):return: List of similar results"""# Perform the similarity search using the FAISS indexresults = faiss_index.similarity_search(query, k=k)return results
In this section, we define a prompt template for the language model to summarize a YouTube video transcript. The prompt serves as a structured instruction that guides the model in generating a coherent summary. The template includes placeholders for dynamic content, specifically the transcript of the video.
The function returns a PromptTemplate object configured to accept the transcript as an input variable. This ensures that when the prompt is utilized, it can seamlessly incorporate the specific transcript text that needs summarization. The summary generated by the model should focus on the key points, omitting any timestamps present in the original transcript.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
def create_summary_prompt():"""Create a PromptTemplate for summarizing a YouTube video transcript.:return: PromptTemplate object"""# Define the template for the summary prompttemplate = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an AI assistant tasked with summarizing YouTube video transcripts. Provide concise, informative summaries that capture the main points of the video content.Instructions:1. Summarize the transcript in a single concise paragraph.2. Ignore any timestamps in your summary.3. Focus on the spoken content (Text) of the video.Note: In the transcript, "Text" refers to the spoken words in the video, and "start" indicates the timestamp when that part begins in the video.<|eot_id|><|start_header_id|>user<|end_header_id|>Please summarize the following YouTube video transcript:{transcript}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""# Create the PromptTemplate object with the defined templateprompt = PromptTemplate(input_variables=["transcript"],template=template)return prompt
In this section, we create an LLMChain for generating summaries from the provided transcript using the defined prompt template. An LLMChain is a construct that combines a language model with a specific prompt, facilitating the process of generating text outputs based on the input data.
The function accepts the following parameters:
llm: An instance of the language model that will be used for summarization. This model processes the input and generates the summary.
prompt: A PromptTemplate instance that contains the structured prompt for the model. It guides the LLM in generating a concise summary of the transcript.
verbose: A boolean parameter that determines whether to enable verbose output during the summary generation process (default is True).
This function returns an LLMChain instance, which can then be used to generate summaries efficiently.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
def create_summary_chain(llm, prompt, verbose=True):"""Create an LLMChain for generating summaries.:param llm: Language model instance:param prompt: PromptTemplate instance:param verbose: Boolean to enable verbose output (default: True):return: LLMChain instance"""return LLMChain(llm=llm, prompt=prompt, verbose=verbose)
This function retrieves relevant context from the FAISS index based on the user's query. It leverages the similarity search capabilities of FAISS to find the most pertinent documents or chunks that relate to the input query.
Parameters:
query(str): The user's query string that specifies what information is being sought.faiss_index(FAISS): The FAISS index containing the embedded documents, which allows for efficient similarity searches.k(int, optional, default=3): The number of most relevant documents to retrieve.Returns:
list: A list of the k most relevant documents (or document chunks) that match the query.Function workflow:
k most similar documents or chunks to the query.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
def retrieve(query, faiss_index, k=7):"""Retrieve relevant context from the FAISS index based on the user's query.Parameters:query (str): The user's query string.faiss_index (FAISS): The FAISS index containing the embedded documents.k (int, optional): The number of most relevant documents to retrieve (default is 3).Returns:list: A list of the k most relevant documents (or document chunks)."""relevant_context = faiss_index.similarity_search(query, k=k)return relevant_context
This function structures the prompt for answering questions based on the video content. It is designed to guide the AI in providing accurate and detailed responses to user queries.
Returns:
PromptTemplate: A configured PromptTemplate object for Q&A tasks.Function workflow:
PromptTemplate object with the template and the required input variables.PromptTemplate.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
from langchain import PromptTemplatedef create_qa_prompt_template():"""Create a PromptTemplate for question answering based on video content.Returns:PromptTemplate: A PromptTemplate object configured for Q&A tasks."""# Define the template stringqa_template = """You are an expert assistant providing detailed answers based on the following video content.Relevant Video Context: {context}Based on the above context, please answer the following question:Question: {question}"""# Create the PromptTemplate objectprompt_template = PromptTemplate(input_variables=["context", "question"],template=qa_template)return prompt_template
Following is an example of how to call create_qa_prompt_template() to create a prompt template for Q&A tasks:
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
# Creating the Q&A prompt templateqa_prompt_template = create_qa_prompt_template()# Example of how to use the prompt template with context and a questioncontext = "This video explains the fundamentals of quantum physics."question = "What are the key principles discussed in the video?"# Generating the promptgenerated_prompt = qa_prompt_template.format(context=context, question=question)# Output the generated promptprint(generated_prompt)
The create_qa_prompt_template() function returns a PromptTemplate object, which can be used to format the context and question. The output from the format() method will look like this:
- 1
- 2
- 3
- 4
- 5
- 6
You are an expert assistant providing detailed answers based on the following video content.Relevant Video Context: This video explains the fundamentals of quantum physics.Based on the above context, please answer the following question:Question: What are the key principles discussed in the video?
This section demonstrates how to instantiate an LLMChain for generating answers to questions based on a given prompt template and language model.
Returns:
LLMChain: An instantiated LLMChain ready for question answering.Function workflow:
LLMChain that combines the model and the prompt.LLMChain object.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
def create_qa_chain(llm, prompt_template, verbose=True):"""Create an LLMChain for question answering.Args:llm: Language model instanceThe language model to use in the chain (e.g., WatsonxGranite).prompt_template: PromptTemplateThe prompt template to use for structuring inputs to the language model.verbose: bool, optional (default=True)Whether to enable verbose output for the chain.Returns:LLMChain: An instantiated LLMChain ready for question answering."""return LLMChain(llm=llm, prompt=prompt_template, verbose=verbose)
This section demonstrates how to retrieve relevant context from a FAISS index and generate an answer based on user input.
Returns:
str: The generated answer to the user's question.Function workflow:
LLMChain) to generate an answer based on the retrieved context and the user's question.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
def generate_answer(question, faiss_index, qa_chain, k=7):"""Retrieve relevant context and generate an answer based on user input.Args:question: strThe user's question.faiss_index: FAISSThe FAISS index containing the embedded documents.qa_chain: LLMChainThe question-answering chain (LLMChain) to use for generating answers.k: int, optional (default=3)The number of relevant documents to retrieve.Returns:str: The generated answer to the user's question."""# Retrieve relevant contextrelevant_context = retrieve(question, faiss_index, k=k)# Generate answer using the QA chainanswer = qa_chain.predict(context=relevant_context, question=question)return answer
This function generates a summary of a video using the preprocessed transcript. It uses IBM Watson's services to create an effective summary, ensuring that if the transcript hasn't been fetched yet, it fetches it first.
Returns:
str: The generated summary of the video or a message indicating that no transcript is available.Function workflow:
processed_transcript global variable.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
# Initialize an empty string to store the processed transcript after fetching and preprocessingprocessed_transcript = ""def summarize_video(video_url):"""Title: Summarize VideoDescription:This function generates a summary of the video using the preprocessed transcript.If the transcript hasn't been fetched yet, it fetches it first.Args:video_url (str): The URL of the YouTube video from which the transcript is to be fetched.Returns:str: The generated summary of the video or a message indicating that no transcript is available."""global fetched_transcript, processed_transcriptif video_url:# Fetch and preprocess transcriptfetched_transcript = get_transcript(video_url)processed_transcript = process(fetched_transcript)else:return "Please provide a valid YouTube URL."if processed_transcript:# Step 1: Set up IBM Watson credentialsmodel_id, credentials, client, project_id = setup_credentials()# Step 2: Initialize WatsonX LLM for summarizationllm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())# Step 3: Create the summary prompt and chainsummary_prompt = create_summary_prompt()summary_chain = create_summary_chain(llm, summary_prompt)# Step 4: Generate the video summarysummary = summary_chain.run({"transcript": processed_transcript})return summaryelse:return "No transcript available. Please fetch the transcript first."
This function retrieves relevant context from the FAISS index based on the user's query and generates an answer using the preprocessed transcript. It first checks if the transcript has been fetched; if not, it fetches and processes the transcript from the provided YouTube video URL.
If the transcript is available and a user question is provided, the function proceeds to chunk the transcript for better context retrieval. It then sets up IBM Watson credentials and initializes the watsonx LLM specifically for Q&A tasks.
Next, it creates a FAISS index using the chunked transcript and sets up the Q&A prompt template and chain. Finally, it generates an answer to the user's question using the FAISS index and returns the answer. If the transcript hasn't been fetched or if the user fails to provide a valid question, the function returns a relevant message indicating the issue.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
def answer_question(video_url, user_question):"""Title: Answer User's QuestionDescription:This function retrieves relevant context from the FAISS index based on the user’s queryand generates an answer using the preprocessed transcript.If the transcript hasn't been fetched yet, it fetches it first.Args:video_url (str): The URL of the YouTube video from which the transcript is to be fetched.user_question (str): The question posed by the user regarding the video.Returns:str: The answer to the user's question or a message indicating that the transcripthas not been fetched."""global fetched_transcript, processed_transcript# Check if the transcript needs to be fetchedif not processed_transcript:if video_url:# Fetch and preprocess transcriptfetched_transcript = get_transcript(video_url)processed_transcript = process(fetched_transcript)else:return "Please provide a valid YouTube URL."if processed_transcript and user_question:# Step 1: Chunk the transcript (only for Q&A)chunks = chunk_transcript(processed_transcript)# Step 2: Set up IBM Watson credentialsmodel_id, credentials, client, project_id = setup_credentials()# Step 3: Initialize WatsonX LLM for Q&Allm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())# Step 4: Create FAISS index for transcript chunks (only needed for Q&A)embedding_model = setup_embedding_model(credentials, project_id)faiss_index = create_faiss_index(chunks, embedding_model)# Step 5: Set up the Q&A prompt and chainqa_prompt = create_qa_prompt_template()qa_chain = create_qa_chain(llm, qa_prompt)# Step 6: Generate the answer using FAISS indexanswer = generate_answer(user_question, faiss_index, qa_chain)return answerelse:return "Please provide a valid question and ensure the transcript has been fetched."
This section describes the setup of a Gradio interface for interacting with a YouTube video, allowing users to fetch its transcript, summarize it, or ask questions based on the content of the video. The interface is built using Gradio's Blocks API, which facilitates the creation of interactive web applications with minimal code.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
with gr.Blocks() as interface:# Input field for YouTube URLvideo_url = gr.Textbox(label="YouTube Video URL", placeholder="Enter the YouTube Video URL")# Outputs for summary and answersummary_output = gr.Textbox(label="Video Summary", lines=5)question_input = gr.Textbox(label="Ask a Question About the Video", placeholder="Ask your question")answer_output = gr.Textbox(label="Answer to Your Question", lines=5)# Buttons for selecting functionalities after fetching transcriptsummarize_btn = gr.Button("Summarize Video")question_btn = gr.Button("Ask a Question")# Display status message for transcript fetchtranscript_status = gr.Textbox(label="Transcript Status", interactive=False)# Set up button actionssummarize_btn.click(summarize_video, inputs=video_url, outputs=summary_output)question_btn.click(answer_question, inputs=[video_url, question_input], outputs=answer_output)# Launch the app with specified server name and portinterface.launch(server_name="0.0.0.0", server_port=7860)
Input field: A textbox (video_url) is created for users to enter the URL of the YouTube video they want to analyze.
Output fields: Two additional textboxes, summary_output and answer_output, are used to display the generated summary and answers to user questions, respectively. A question_input textbox allows users to type their queries regarding the video content.
Buttons: Two buttons, summarize_btn and question_btn, are included to allow users to trigger the summarization of the video or to ask a specific question about it.
Transcript status: A textbox (transcript_status) displays feedback to the user regarding the status of the transcript fetching process, indicating whether it was successful or if there were issues (for example, invalid URL).
Button actions: The summarize_btn is linked to the summarize_video function, which takes the YouTube URL as input and returns the summary to summary_output. The question_btn is linked to the answer_question function, which takes both the YouTube URL and user question as inputs and returns the answer to answer_output.
Launch configuration: Finally, the interface.launch(server_name="0.0.0.0", server_port=7860) line starts the Gradio application, enabling users to access it using a web browser on their local server.
In this section, you'll find the full, consolidated code for the application, which includes all code snippets provided in previous steps. Use this as a reference to ensure that your implementation is consistent with the complete code structure required for the application to function as intended.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
# Import necessary libraries for the YouTube botimport gradio as grimport re #For extracting video idfrom youtube_transcript_api import YouTubeTranscriptApi # For extracting transcripts from YouTube videosfrom langchain.text_splitter import RecursiveCharacterTextSplitter # For splitting text into manageable segmentsfrom ibm_watsonx_ai.foundation_models.utils.enums import ModelTypes # For specifying model typesfrom ibm_watsonx_ai import APIClient, Credentials # For API client and credentials managementfrom ibm_watsonx_ai.metanames import GenTextParamsMetaNames as GenParams # For managing model parametersfrom ibm_watsonx_ai.foundation_models.utils.enums import DecodingMethods # For defining decoding methodsfrom langchain_ibm import WatsonxLLM, WatsonxEmbeddings # For interacting with IBM's LLM and embeddingsfrom ibm_watsonx_ai.foundation_models.utils import get_embedding_model_specs # For retrieving model specificationsfrom ibm_watsonx_ai.foundation_models.utils.enums import EmbeddingTypes # For specifying types of embeddingsfrom langchain_community.vectorstores import FAISS # For efficient vector storage and similarity searchfrom langchain.chains import LLMChain # For creating chains of operations with LLMsfrom langchain.prompts import PromptTemplate # For defining prompt templatesdef get_video_id(url):# Regex pattern to match YouTube video URLspattern = r'https:\/\/www\.youtube\.com\/watch\?v=([a-zA-Z0-9_-]{11})'match = re.search(pattern, url)return match.group(1) if match else Nonedef get_transcript(url):# Extracts the video ID from the URLvideo_id = get_video_id(url)# Create a YouTubeTranscriptApi() objectytt_api = YouTubeTranscriptApi()# Fetch the list of available transcripts for the given YouTube videotranscripts = ytt_api.list(video_id)transcript = ""for t in transcripts:# Check if the transcript's language is Englishif t.language_code == 'en':if t.is_generated:# If no transcript has been set yet, use the auto-generated oneif len(transcript) == 0:transcript = t.fetch()else:# If a manually created transcript is found, use it (overrides auto-generated)transcript = t.fetch()break # Prioritize the manually created transcript, exit the loopreturn transcript if transcript else Nonedef process(transcript):# Initialize an empty string to hold the formatted transcripttxt = ""# Loop through each entry in the transcriptfor i in transcript:try:# Append the text and its start time to the output string#txt += f"Text: {i['text']} Start: {i['start']}\n"txt += f"Text: {i.text} Start: {i.start}\n"except KeyError:# If there is an issue accessing 'text' or 'start', skip this entrypass# Return the processed transcript as a single stringreturn txtdef chunk_transcript(processed_transcript, chunk_size=200, chunk_overlap=20):# Initialize the RecursiveCharacterTextSplitter with specified chunk size and overlaptext_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,chunk_overlap=chunk_overlap)# Split the transcript into chunkschunks = text_splitter.split_text(processed_transcript)return chunksdef setup_credentials():# Define the model ID for the WatsonX model being usedmodel_id = "meta-llama/llama-3-2-3b-instruct"# Set up the credentials by specifying the URL for IBM Watson servicescredentials = Credentials(url="https://us-south.ml.cloud.ibm.com")# Create an API client using the credentialsclient = APIClient(credentials)# Define the project ID associated with the WatsonX platformproject_id = "skills-network"# Return the model ID, credentials, client, and project ID for later usereturn model_id, credentials, client, project_iddef define_parameters():# Return a dictionary containing the parameters for the WatsonX modelreturn {# Set the decoding method to GREEDY for generating textGenParams.DECODING_METHOD: DecodingMethods.GREEDY,# Specify the maximum number of new tokens to generateGenParams.MAX_NEW_TOKENS: 900,}def initialize_watsonx_llm(model_id, credentials, project_id, parameters):# Create and return an instance of the WatsonxLLM with the specified configurationreturn WatsonxLLM(model_id=model_id, # Set the model ID for the LLMurl=credentials.get("url"), # Retrieve the service URL from credentialsproject_id=project_id, # Set the project ID for accessing resourcesparams=parameters # Pass the parameters for model behavior)def setup_embedding_model(credentials, project_id):# Create and return an instance of WatsonxEmbeddings with the specified configurationreturn WatsonxEmbeddings(model_id=EmbeddingTypes.IBM_SLATE_30M_ENG.value, # Set the model ID for the SLATE-30M embedding modelurl=credentials["url"], # Retrieve the service URL from the provided credentialsproject_id=project_id # Set the project ID for accessing resources in the Watson environment)def create_faiss_index(chunks, embedding_model):"""Create a FAISS index from text chunks using the specified embedding model.:param chunks: List of text chunks:param embedding_model: The embedding model to use:return: FAISS index"""# Use the FAISS library to create an index from the provided text chunksreturn FAISS.from_texts(chunks, embedding_model)def perform_similarity_search(faiss_index, query, k=3):"""Search for specific queries within the embedded transcript using the FAISS index.:param faiss_index: The FAISS index containing embedded text chunks:param query: The text input for the similarity search:param k: The number of similar results to return (default is 3):return: List of similar results"""# Perform the similarity search using the FAISS indexresults = faiss_index.similarity_search(query, k=k)return resultsdef create_summary_prompt():"""Create a PromptTemplate for summarizing a YouTube video transcript.:return: PromptTemplate object"""# Define the template for the summary prompttemplate = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an AI assistant tasked with summarizing YouTube video transcripts. Provide concise, informative summaries that capture the main points of the video content.Instructions:1. Summarize the transcript in a single concise paragraph.2. Ignore any timestamps in your summary.3. Focus on the spoken content (Text) of the video.Note: In the transcript, "Text" refers to the spoken words in the video, and "start" indicates the timestamp when that part begins in the video.<|eot_id|><|start_header_id|>user<|end_header_id|>Please summarize the following YouTube video transcript:{transcript}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""# Create the PromptTemplate object with the defined templateprompt = PromptTemplate(input_variables=["transcript"],template=template)return promptdef create_summary_chain(llm, prompt, verbose=True):"""Create an LLMChain for generating summaries.:param llm: Language model instance:param prompt: PromptTemplate instance:param verbose: Boolean to enable verbose output (default: True):return: LLMChain instance"""return LLMChain(llm=llm, prompt=prompt, verbose=verbose)def retrieve(query, faiss_index, k=7):"""Retrieve relevant context from the FAISS index based on the user's query.Parameters:query (str): The user's query string.faiss_index (FAISS): The FAISS index containing the embedded documents.k (int, optional): The number of most relevant documents to retrieve (default is 3).Returns:list: A list of the k most relevant documents (or document chunks)."""relevant_context = faiss_index.similarity_search(query, k=k)return relevant_contextdef create_qa_prompt_template():"""Create a PromptTemplate for question answering based on video content.Returns:PromptTemplate: A PromptTemplate object configured for Q&A tasks."""# Define the template stringqa_template = """<|begin_of_text|><|start_header_id|>system<|end_header_id|>You are an expert assistant providing detailed and accurate answers based on the following video content. Your responses should be:1. Precise and free from repetition2. Consistent with the information provided in the video3. Well-organized and easy to understand4. Focused on addressing the user's question directlyIf you encounter conflicting information in the video content, use your best judgment to provide the most likely correct answer based on context.Note: In the transcript, "Text" refers to the spoken words in the video, and "start" indicates the timestamp when that part begins in the video.<|eot_id|><|start_header_id|>user<|end_header_id|>Relevant Video Context: {context}Based on the above context, please answer the following question:{question}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"""# Create the PromptTemplate objectprompt_template = PromptTemplate(input_variables=["context", "question"],template=qa_template)return prompt_templatedef create_qa_chain(llm, prompt_template, verbose=True):"""Create an LLMChain for question answering.Args:llm: Language model instanceThe language model to use in the chain (e.g., WatsonxGranite).prompt_template: PromptTemplateThe prompt template to use for structuring inputs to the language model.verbose: bool, optional (default=True)Whether to enable verbose output for the chain.Returns:LLMChain: An instantiated LLMChain ready for question answering."""return LLMChain(llm=llm, prompt=prompt_template, verbose=verbose)def generate_answer(question, faiss_index, qa_chain, k=7):"""Retrieve relevant context and generate an answer based on user input.Args:question: strThe user's question.faiss_index: FAISSThe FAISS index containing the embedded documents.qa_chain: LLMChainThe question-answering chain (LLMChain) to use for generating answers.k: int, optional (default=3)The number of relevant documents to retrieve.Returns:str: The generated answer to the user's question."""# Retrieve relevant contextrelevant_context = retrieve(question, faiss_index, k=k)# Generate answer using the QA chainanswer = qa_chain.predict(context=relevant_context, question=question)return answer# Initialize an empty string to store the processed transcript after fetching and preprocessingprocessed_transcript = ""def summarize_video(video_url):"""Title: Summarize VideoDescription:This function generates a summary of the video using the preprocessed transcript.If the transcript hasn't been fetched yet, it fetches it first.Args:video_url (str): The URL of the YouTube video from which the transcript is to be fetched.Returns:str: The generated summary of the video or a message indicating that no transcript is available."""global fetched_transcript, processed_transcriptif video_url:# Fetch and preprocess transcriptfetched_transcript = get_transcript(video_url)processed_transcript = process(fetched_transcript)else:return "Please provide a valid YouTube URL."if processed_transcript:# Step 1: Set up IBM Watson credentialsmodel_id, credentials, client, project_id = setup_credentials()# Step 2: Initialize WatsonX LLM for summarizationllm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())# Step 3: Create the summary prompt and chainsummary_prompt = create_summary_prompt()summary_chain = create_summary_chain(llm, summary_prompt)# Step 4: Generate the video summarysummary = summary_chain.run({"transcript": processed_transcript})return summaryelse:return "No transcript available. Please fetch the transcript first."def answer_question(video_url, user_question):"""Title: Answer User's QuestionDescription:This function retrieves relevant context from the FAISS index based on the user’s queryand generates an answer using the preprocessed transcript.If the transcript hasn't been fetched yet, it fetches it first.Args:video_url (str): The URL of the YouTube video from which the transcript is to be fetched.user_question (str): The question posed by the user regarding the video.Returns:str: The answer to the user's question or a message indicating that the transcripthas not been fetched."""global fetched_transcript, processed_transcript# Check if the transcript needs to be fetchedif not processed_transcript:if video_url:# Fetch and preprocess transcriptfetched_transcript = get_transcript(video_url)processed_transcript = process(fetched_transcript)else:return "Please provide a valid YouTube URL."if processed_transcript and user_question:# Step 1: Chunk the transcript (only for Q&A)chunks = chunk_transcript(processed_transcript)# Step 2: Set up IBM Watson credentialsmodel_id, credentials, client, project_id = setup_credentials()# Step 3: Initialize WatsonX LLM for Q&Allm = initialize_watsonx_llm(model_id, credentials, project_id, define_parameters())# Step 4: Create FAISS index for transcript chunks (only needed for Q&A)embedding_model = setup_embedding_model(credentials, project_id)faiss_index = create_faiss_index(chunks, embedding_model)# Step 5: Set up the Q&A prompt and chainqa_prompt = create_qa_prompt_template()qa_chain = create_qa_chain(llm, qa_prompt)# Step 6: Generate the answer using FAISS indexanswer = generate_answer(user_question, faiss_index, qa_chain)return answerelse:return "Please provide a valid question and ensure the transcript has been fetched."with gr.Blocks() as interface:gr.Markdown("<h2 style='text-align: center;'>YouTube Video Summarizer and Q&A</h2>")# Input field for YouTube URLvideo_url = gr.Textbox(label="YouTube Video URL", placeholder="Enter the YouTube Video URL")# Outputs for summary and answersummary_output = gr.Textbox(label="Video Summary", lines=5)question_input = gr.Textbox(label="Ask a Question About the Video", placeholder="Ask your question")answer_output = gr.Textbox(label="Answer to Your Question", lines=5)# Buttons for selecting functionalities after fetching transcriptsummarize_btn = gr.Button("Summarize Video")question_btn = gr.Button("Ask a Question")# Display status message for transcript fetchtranscript_status = gr.Textbox(label="Transcript Status", interactive=False)# Set up button actionssummarize_btn.click(summarize_video, inputs=video_url, outputs=summary_output)question_btn.click(answer_question, inputs=[video_url, question_input], outputs=answer_output)# Launch the app with specified server name and portinterface.launch(server_name="0.0.0.0", server_port=7860)
To serve the application, paste the following into your Python terminal:
- 1
python3.11 ytbot.py
If you cannot find an open Python terminal or the buttons on the above cell do not work, you can launch a terminal by going to Terminal > New Terminal. However, if you launch a new terminal, do not forget to source the virtual environment you created at the beginning of the tutorial before running the above line:
- 1
source my_env/bin/activate # activate my_env
You are now ready to launch the served application! To launch, click the following button:
If the above button does not work, complete the following steps:
ytbot.py)To test the application, you can use the YouTube video link https://www.youtube.com/watch?v=T-D1OfcDW1M. This video offers a high-level introduction to RAG from a trusted source and can help ground the LLM’s responses, reducing the likelihood of hallucinations.
https://www.youtube.com/watch?v=T-D1OfcDW1MAfter summarizing the video, you can engage further by asking specific questions:
Question: How does one reduce hallucinations?
This question can’t be answered accurately without context, as the term ‘hallucination’ can refer either to a psychological condition in humans or to the generation of false or misleading outputs by large language models (LLMs). Fortunately, in this case, we have a video transcript that provides the necessary context. To confirm this, simply paste the question into the Ask a Question About the Video input field and click the Ask a Question button.
Question: Which problems does RAG solve, according to the video?
In this case we are asking for information that is specifically contained in the video. In order to obtain a context-aware response, paste the question into the Ask a Question About the Video input field and click the Ask a Question button.
In this lab, you explored the use of AI and NLP techniques to fetch, summarize, and ask questions about YouTube videos. You learned how to:
You've made great progress, and if you missed anything, don't worry! You can always come back and do the lab again to reinforce your understanding.
Now that you've gained hands-on experience with video summarization and Q&A, here are some ideas for further exploration:
Try asking different questions: Experiment with asking new types of questions based on the video you already used. For instance, you can ask about specific timestamps, deeper insights on discussed topics, or further clarifications.
Use a different video: Test the application with a new video. Simply input a different YouTube URL and see how well the summarizer and Q&A tool handle new content. This will help you assess the model's adaptability to different video topics and formats.
Enhance the application: Consider adding new features such as sentiment analysis on the video transcript or enabling the tool to summarize videos in different languages.